Grep and Regex

Files

Download the zip file files-for-cats.zip to perform the examples on this page.

files-for-cats.zip

Download and unzip with the following:

wget https://csc222.jcor.dev/files/files-for-cats.zip
unzip files-for-cats.zip -d files-for-cats

Grep

Grep is a command used to search through the contents of files using regular expressions. A regular expression is a sequence of characters that represents a search pattern.

Grep stands for Globally search for a Regular Expression and Print

A typical format for using the grep command is as follows:

grep [flags] [search string] [file or files]

Common Flags

The following are some common flags on may use with grep and their meaning. This is not an exhaustive list. For more information on the flags, use the man grep command.

Flag        Meaning
-i          ignore case for the search string
-r          search recursively into a directory and the files in that directory
-c          count the number of lines the search string occurrs on in each file
-o          show each occurrence on its own line when the result is printed
-P          allow the use of PCRE (Perl Compatible regular expressions)

Concerning the last flag, we will be using Perl compatible regular expresssions as it greatly enhannces pattern matching and allows for more concise expressions compared to standard regular expressions.

Any of the above flags can be combined, as will be shown in a few examples that follow.

Tipgrep on macOS

If you’re using macOS, the built in grep command does not highlight the matches when printing the result like it does in Ubuntu. To get highlighted matches, download GNU grep using the command brew install grep. If you do not have homebrew installed, install instructions are found at (https://brew.sh/)https://brew.sh/.

After installing, use the command ggrep in place of grep.

You can use the flag --color with ggrep to force text highlighting.

Grep Examples

For each of these examples, download the files at https://csc222.jcor.dev/files/files-for-cats.zip, unzip them, and navigate into the top level directory in a terminal.

# Downloading and extracting the files for the examples.
wget https://jcoriell.github.io/csc222/files/files-for-cats.zip
unzip files-for-cats.zip -d files-for-cats
rm files-for-cats.zip
cd files-for-cats 

Example 1

  1. Find Cat in main.py

    grep Cat main.py

    Expected result:

    print('Hello, Cats of the Internet!')
  2. Find cat in main.py

    grep cat main.py

    Expected result:

    # nothing appears because it is case sensitive
  3. Ignore case with the -i flag.

    grep -i cat main.py

    Expected result: txt print('Hello, Cats of the Internet!')

Example 2

  1. Navigate into the docs directory of files-for-cats

  2. Find more instances of cat in guide.txt and notes.txt

    grep cat guide.txt notes.txt 

    Expected Output: txt guide.txt:1. The cat owns you, not the other way around. guide.txt:3. If a cat sits on your laptop, you are no longer allowed to work. notes.txt:- Cats sleep 16 hours a day. Be like a cat. notes.txt:- Main.py is where you simulate cat-human communication.

  3. Find even more instances when case is ignored.

    grep cat guide.txt notes.txt 

    Expected Output:

    guide.txt:CAT OWNER'S GUIDE:
    guide.txt:1. The cat owns you, not the other way around.
    guide.txt:3. If a cat sits on your laptop, you are no longer allowed to work.
    notes.txt:Observations from Cat HQ:
    notes.txt:- Cats sleep 16 hours a day. Be like a cat.
    notes.txt:- Main.py is where you simulate cat-human communication.

Example 3

  1. Go back to the top level of ‘files-for-cats’

  2. Use -r to search recursively through the files in a directory and its sub-directories

    grep -r cat docs

    Expected Output:

    docs/guide.txt:1. The cat owns you, not the other way around.
    docs/guide.txt:3. If a cat sits on your laptop, you are no longer allowed to work.
    docs/notes.txt:- Cats sleep 16 hours a day. Be like a cat.
    docs/notes.txt:- Main.py is where you simulate cat-human communication.

Example 4

  1. You can combine the flags. In the following, -o shows each occurance on its own line and -c shows the count of lines where occurances appeared.

  2. Try each of these commands out.

grep -irc meow .
grep -iro meow .
# pipe the result of grep into the word count command
# -l will count the number of lines input into word count
grep -iro meow . | wc -l   

Regex

When using many of these characters in our regular expressions, we pass the -P flag into grep to be sure they are recognized. The -P indicates we are using Perl compatible regular expressions. For more on Perl Compatible Regular Expressions, visit https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions.

Basic Characters

Character Meaning
. Any character except newline
\d Digit (0–9)
\D Not a digit (not 0–9)
\w Word character (a–z, A–Z, 0–9, _)
\W Not a word character
\s Whitespace (space, tab, newline)
\S Not whitespace

Examples

Each of these examples uses the files-for-cats directory. Test them on your machine to see the outputs.

# finds groups of 3 characters 
grep -P "..." docs/contacts.txt

# shows the same as above, but each occurrence  
grep -oP "..." docs/contacts.txt    

# finds the letter s and then whitespace
grep -P "s\s" docs/contacts.txt     

# finds three digits, then a dash, then 3 digits
grep -P "\d\d\d-\d\d\d\d" docs/contacts.txt     

# finds a word character, whitespace, then an open parentheses
grep -P "\w\s\("  docs/contacts.txt    

Meta Characters

Character Meaning
[] Used for grouping several characters
[^ ] Used to not match the characters in brackets
{} Used for quantifying with exact matches
() Used for grouping
\ Escape character (escapes the meaning of metacharacters)
| OR
? Matches 0 or 1 of something (and other uses)
* Matches 0 or more of something
+ Matches 1 or more of something

Examples

Each of the following examples are performed from the top level of the files-for-cats directory.

# find the phone numbers in the format (###) ###-####
grep -P "\(\d{3}\)\s\d{3}-\d{4}" docs/contacts.txt   

# find entries with first and last names
grep -P " \w* \w* " docs/contacts.txt   

# find a capital letter, then any number of lowercase letters, then a space, then any number of capital and lowercase letters 
grep -P "[A-Z][a-z]* [a-zA-Z]*" docs/contacts.txt   

# any instances of Purr or Meow
grep -P "(Purr|Meow)" docs/contacts.txt    

# any instances of Purr or Meow where the case on purr doesn't matter
grep -P "([pP]urr|Meow)" docs/contacts.txt          

# find instances of a . followed by at least two alphabetic characters
grep -P "\.[a-zA-Z]{2,}" docs/contacts.txt           

# find instances of a . followed by at least two but at most 3 alphabetic characters
grep -P "\.[a-zA-Z]{2,3}" docs/contacts.txt 

# everything except lowercase letters
grep -P "[^a-Z]" docs/contacts.txt                  

# find emails
grep -P "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" docs/contacts.txt    

Searching For Boundaries

Character Meaning
\b Word boundary (occurs when a word character is adjacent to a non-word character)
\B Not a word boundary
^ Beginning of a line
$ End of a line

Examples

These examples are performed at the root of file-for-cats.

# match furr or Furr only when there is a word boundary before it
grep -P "\b[fF]urr" docs/contacts.txt   

# lines that begin with a - 
grep -P "^-" docs/contacts.txt    

# lines that end with com
grep -P "\.com$" docs/contacts.txt      

Extra Practice

For extra practice with grep and regex, check out the following tools. Over the wire bandit will give you more terminal practice as well.